{
 "cells": [
  {
   "attachments": {},
   "cell_type": "markdown",
   "id": "134a0785",
   "metadata": {},
   "source": [
    "# 揭秘链的复杂性\n",
    "\n",
    "随着链需要承担的责任越来越多，链的内部结构也会变得越来越复杂。这是因为链需要处理更多的任务，需要更多的组件来完成这些任务，而这些组件又可能是其他的链。因此，当我们需要处理更复杂的业务场景时，我们可能需要使用到更复杂的链。"
   ]
  },
  {
   "attachments": {},
   "cell_type": "markdown",
   "id": "17e86bea",
   "metadata": {},
   "source": [
    "这次教程为说明链之间的关系，流程是首先使用数据连接模块中的LEDVR工作流（这个代码同 ../Data_connection/LEDVR的示例代码.ipynb)，获取到相关文档，然后使用 load_qa_chain, 跟这个文档进行聊天。\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "c63c1370",
   "metadata": {},
   "outputs": [],
   "source": [
    "%pip install langchain\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 1,
   "id": "8f64c8cf",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "'\\n\\n你好！'"
      ]
     },
     "execution_count": 1,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# 此为测试密钥和环境的代码。\n",
    "from langchain.llms import OpenAI\n",
    "openai_api_key=\"\"\n",
    "\n",
    "llm = OpenAI(openai_api_key=openai_api_key)\n",
    "llm.predict(\"你好\")"
   ]
  },
  {
   "attachments": {},
   "cell_type": "markdown",
   "id": "2d915367",
   "metadata": {},
   "source": [
    "每次开始之前，最好用模型包装器测试一下自己的密钥和环境是否配置正确。"
   ]
  },
  {
   "attachments": {},
   "cell_type": "markdown",
   "id": "0eb65f69",
   "metadata": {},
   "source": [
    "首先，我们使用文档加载器L，创建一个WebBaseLoader实例，用于从网络加载数据。在这个例子中，我们加载的是一个博客文章。加载器读取该网址的内容，并将其转换为一份文档数据。"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 3,
   "id": "0fe5ed38",
   "metadata": {},
   "outputs": [],
   "source": [
    "from langchain.document_loaders import WebBaseLoader\n",
    "loader = WebBaseLoader(\"http://developers.mini1.cn/wiki/luawh.html\")\n",
    "data = loader.load()"
   ]
  },
  {
   "attachments": {},
   "cell_type": "markdown",
   "id": "2c6f866c",
   "metadata": {},
   "source": [
    "随后，我们使用文本嵌入模型embedding，将这些分割后的文本数据转换为向量数据。我们创建一个OpenAIEmbeddings实例，用于将文本转换为向量。"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 4,
   "id": "a1748774",
   "metadata": {},
   "outputs": [],
   "source": [
    "from langchain.embeddings.openai import OpenAIEmbeddings\n",
    "\n",
    "embedding = OpenAIEmbeddings(openai_api_key=openai_api_key)\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 5,
   "id": "2acd2bd7",
   "metadata": {},
   "outputs": [],
   "source": [
    "from langchain.text_splitter import RecursiveCharacterTextSplitter\n",
    "text_splitter = RecursiveCharacterTextSplitter(chunk_size=500, chunk_overlap=0)\n",
    "splits = text_splitter.split_documents(data)\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 6,
   "id": "c60b5720",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "79"
      ]
     },
     "execution_count": 6,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "len(splits)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 7,
   "id": "fb7bd707",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "<langchain.vectorstores.faiss.FAISS at 0x18ab4774690>"
      ]
     },
     "execution_count": 7,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "\n",
    "from langchain.vectorstores import FAISS\n",
    "vectordb = FAISS.from_documents(documents=splits,embedding=embedding)\n",
    "vectordb"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 8,
   "id": "70c4e529",
   "metadata": {
    "tags": []
   },
   "outputs": [],
   "source": [
    "\n",
    "from langchain.llms import OpenAI\n",
    "from langchain.chains import ConversationalRetrievalChain"
   ]
  },
  {
   "attachments": {},
   "cell_type": "markdown",
   "id": "cdff94be",
   "metadata": {},
   "source": [
    "你可以选择其他加载器。"
   ]
  },
  {
   "attachments": {},
   "cell_type": "markdown",
   "id": "898b574b",
   "metadata": {},
   "source": [
    "我们加入记忆模块的记忆，保存对话的历史记录。"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 10,
   "id": "af803fee",
   "metadata": {},
   "outputs": [],
   "source": [
    "from langchain.memory import ConversationBufferMemory\n",
    "\n",
    "memory = ConversationBufferMemory(memory_key=\"chat_history\", return_messages=True)"
   ]
  },
  {
   "attachments": {},
   "cell_type": "markdown",
   "id": "3c96b118",
   "metadata": {},
   "source": [
    "实例化 `ConversationalRetrievalChain`"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 11,
   "id": "7b4110f3",
   "metadata": {
    "tags": []
   },
   "outputs": [],
   "source": [
    "\n",
    "retriever = vectordb.as_retriever()\n",
    "\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 12,
   "id": "4006af5c",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "VectorStoreRetriever(tags=None, metadata=None, vectorstore=<langchain.vectorstores.faiss.FAISS object at 0x0000018AB4774690>, search_type='similarity', search_kwargs={})"
      ]
     },
     "execution_count": 12,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "retriever"
   ]
  },
  {
   "attachments": {},
   "cell_type": "markdown",
   "id": "aafb8a9d",
   "metadata": {},
   "source": [
    "实例化 ConversationalRetrievalChain 链。这个链负责会话和检索相关文档。"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 13,
   "id": "754f265d",
   "metadata": {},
   "outputs": [],
   "source": [
    "llm = OpenAI(openai_api_key=openai_api_key)\n",
    "qa = ConversationalRetrievalChain.from_llm(llm, retriever, memory=memory)"
   ]
  },
  {
   "attachments": {},
   "cell_type": "markdown",
   "id": "115ba3e5",
   "metadata": {},
   "source": [
    "我们可以开始提问了。"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 18,
   "id": "e8ce4fe9",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "' 标识符用于定义一个变量，函数或其他用户定义的项。它以一个字母A到Z或a到z或下划线_开头后跟0个或多个字母，下划线，数字（0到9）。Lua不允许使用特殊字符如@，$和%来定义标识符。'"
      ]
     },
     "execution_count": 18,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "question = \"什么是标识符?\"\n",
    "result = qa({\"question\": question})\n",
    "result[\"answer\"]"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 23,
   "id": "4c79862b",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "' 全局变量、局部变量、表中的域。'"
      ]
     },
     "execution_count": 23,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "result[\"answer\"]"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 21,
   "id": "c697d9d1",
   "metadata": {},
   "outputs": [],
   "source": [
    "query = \"Lua 变量有三种类型？\"\n",
    "result = qa({\"question\": query})"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 22,
   "id": "ba0678f3",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "' 全局变量、局部变量、表中的域。'"
      ]
     },
     "execution_count": 22,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "result[\"answer\"]"
   ]
  },
  {
   "attachments": {},
   "cell_type": "markdown",
   "id": "99b96dae",
   "metadata": {},
   "source": [
    "## 合并文档的类型设置为 `map_reduce` 类型\n",
    "\n",
    "ConversationalRetrievalChain 默认的是  `stuff` 类型， 根据不同的需求，可以配置不同的类型，算法不一样，最终的结果也不一样。\n",
    " "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 24,
   "id": "e53a9d66",
   "metadata": {
    "tags": []
   },
   "outputs": [],
   "source": [
    "from langchain.chains import LLMChain\n",
    "from langchain.chains.question_answering import load_qa_chain\n",
    "from langchain.chains.conversational_retrieval.prompts import CONDENSE_QUESTION_PROMPT"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 29,
   "id": "bf205e35",
   "metadata": {
    "tags": []
   },
   "outputs": [],
   "source": [
    "\n",
    "question_generator = LLMChain(llm=llm, prompt=CONDENSE_QUESTION_PROMPT)\n",
    "doc_chain = load_qa_chain(llm, chain_type=\"map_reduce\")\n",
    "\n",
    "chain = ConversationalRetrievalChain(\n",
    "    retriever=vectordb.as_retriever(),\n",
    "    question_generator=question_generator,\n",
    "    combine_docs_chain=doc_chain,\n",
    ")"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 26,
   "id": "78155887",
   "metadata": {
    "tags": []
   },
   "outputs": [],
   "source": [
    "chat_history = []\n",
    "query = \"LUA 是什么？\"\n",
    "result = chain({\"question\": query, \"chat_history\": chat_history})"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 27,
   "id": "e54b5fa2",
   "metadata": {
    "tags": []
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "' Lua is a lightweight, scripting language written in standard C and open in source code form. It is designed to be embedded into applications, providing applications with flexible extension and customization capabilities.'"
      ]
     },
     "execution_count": 27,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "result[\"answer\"]"
   ]
  },
  {
   "attachments": {},
   "cell_type": "markdown",
   "id": "a2fe6b14",
   "metadata": {},
   "source": [
    "## 使用QA SOURCE 链条，ConversationalRetrievalChain\n",
    "\n",
    "我们可以使用load_qa_with_sources_chain，我们可以看到答案的来源位置。"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 30,
   "id": "d1058fd2",
   "metadata": {
    "tags": []
   },
   "outputs": [],
   "source": [
    "from langchain.chains.qa_with_sources import load_qa_with_sources_chain"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 31,
   "id": "a6594482",
   "metadata": {
    "tags": []
   },
   "outputs": [],
   "source": [
    "question_generator = LLMChain(llm=llm, prompt=CONDENSE_QUESTION_PROMPT)\n",
    "doc_chain = load_qa_with_sources_chain(llm, chain_type=\"map_reduce\")\n",
    "\n",
    "chain = ConversationalRetrievalChain(\n",
    "    retriever=vectordb.as_retriever(),\n",
    "    question_generator=question_generator,\n",
    "    combine_docs_chain=doc_chain,\n",
    ")"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 34,
   "id": "e2badd21",
   "metadata": {
    "tags": []
   },
   "outputs": [],
   "source": [
    "chat_history = []\n",
    "query = \"当变量个数和值的个数不一致时，Lua会一直以变量个数为基础采取什么策略？\"\n",
    "result = chain({\"question\": query, \"chat_history\": chat_history})"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 35,
   "id": "edb31fe5",
   "metadata": {
    "tags": []
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "' Lua会一直以变量个数为基础，当变量个数多于值的个数时，会按变量个数补足nil，当变量个数少于值的个数时，多余的值会被忽略。\\nSOURCES: http://developers.mini1.cn/wiki/luawh.html'"
      ]
     },
     "execution_count": 35,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "result[\"answer\"]"
   ]
  },
  {
   "attachments": {},
   "cell_type": "markdown",
   "id": "2324cdc6-98bf-4708-b8cd-02a98b1e5b67",
   "metadata": {},
   "source": [
    "## 使用 `stdout` 的 ConversationalRetrievalChain \n",
    "\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 36,
   "id": "2efacec3-2690-4b05-8de3-a32fd2ac3911",
   "metadata": {
    "tags": []
   },
   "outputs": [],
   "source": [
    "from langchain.chains.llm import LLMChain\n",
    "from langchain.callbacks.streaming_stdout import StreamingStdOutCallbackHandler\n",
    "from langchain.chains.conversational_retrieval.prompts import (\n",
    "    CONDENSE_QUESTION_PROMPT,\n",
    "    QA_PROMPT,\n",
    ")\n",
    "from langchain.chains.question_answering import load_qa_chain\n",
    "\n",
    "# Construct a ConversationalRetrievalChain with a streaming llm for combine docs\n",
    "# and a separate, non-streaming llm for question generation\n",
    "llm = OpenAI(temperature=0, openai_api_key=openai_api_key)\n",
    "streaming_llm = OpenAI(\n",
    "    streaming=True,\n",
    "    callbacks=[StreamingStdOutCallbackHandler()],\n",
    "    temperature=0,\n",
    "    openai_api_key=openai_api_key,\n",
    ")\n",
    "\n",
    "question_generator = LLMChain(llm=llm, prompt=CONDENSE_QUESTION_PROMPT)\n",
    "doc_chain = load_qa_chain(streaming_llm, chain_type=\"stuff\", prompt=QA_PROMPT)\n",
    "\n",
    "qa = ConversationalRetrievalChain(\n",
    "    retriever=vectordb.as_retriever(),\n",
    "    combine_docs_chain=doc_chain,\n",
    "    question_generator=question_generator,\n",
    ")"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 37,
   "id": "fd6d43f4-7428-44a4-81bc-26fe88a98762",
   "metadata": {
    "tags": []
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      " While循环, for循环, repeat...until, 和循环嵌套."
     ]
    }
   ],
   "source": [
    "chat_history = []\n",
    "query = \"Lua的循环语句有哪些？\"\n",
    "result = qa({\"question\": query, \"chat_history\": chat_history})"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 38,
   "id": "5ab38978-f3e8-4fa7-808c-c79dec48379a",
   "metadata": {
    "tags": []
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      " for循环是一种数值for循环，它通过指定一个变量从一个值变化到另一个值，每次变化以一个步长递增变量，并执行一次\"执行体\"。"
     ]
    }
   ],
   "source": [
    "chat_history = [(query, result[\"answer\"])]\n",
    "query = \"for循环是什么？\"\n",
    "result = qa({\"question\": query, \"chat_history\": chat_history})"
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3 (ipykernel)",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.11.4"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 5
}
